SIMPLEX NPS Clustered By Head: A Method For Identifying Significant Topics Within A Document

نویسنده

  • Nina Wacholder
چکیده

This paper discusses 'head clustering', a novel, linguistically-motivated method for representing the aboutness of a document. First, a list of candidate significant topics consisting all simplex NPs is extracted from the document. Next, these NPs are clustered by head. Finally, a significance measure is obtained by ranking frequency of heads: those NPs with heads that occur with greater frequency in the document are more significant than NPs whose head occurs less frequently. An important strength of this technique is that it i s in principle domain-general. Furthermore, the output can be filtered in a variety of ways, both for automatic processing and for presentation to users. In order to evaluate the head clustering method, an experiment was conducted in which judges were asked to rate three lists as to whether they conveyed a sense of the content of the article. The judges agreed that the list of simplex NPs with repeated heads was more helpful in representing the content of the full document than a list of keywords with a frequency of greater than one or than a list of repeated word sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Terminological Variation, a Means of Identifying Research Topics from Texts

After extracting terms from a corpus of titles and abstracts in English, syntactic variation relations are identified amongst them in order to detect research topics. Three types of syntactic variations were studied : permutation, expansion and substitution. These syntactic variations yield other relations of formal and conceptual nature. Basing on a distinction of the variation relations accor...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction

Keyphrase extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper we present TopicRank, a graph-based keyphrase extraction method that relies on a topical representation of the document. Candidate keyphrases are clustered into topics and used as vertices in a complete graph. A graph-based ranking model is applied to assi...

متن کامل

Identifying Flow Units Using an Artificial Neural Network Approach Optimized by the Imperialist Competitive Algorithm

The spatial distribution of petrophysical properties within the reservoirs is one of the most important factors in reservoir characterization. Flow units are the continuous body over a specific reservoir volume within which the geological and petrophysical properties are the same. Accordingly, an accurate prediction of flow units is a major task to achieve a reliable petrophysical description o...

متن کامل

Magnetic nanoparticle clusters for photothermal therapy with near-infrared irradiation.

In this study, the photothermal effect of magnetic nanoparticle clusters was firstly reported for the photothermal ablation of tumors both in vitro in cellular systems but also in vivo study. Compared with individual magnetic Fe3O4 nanoparticles (NPs), clustered Fe3O4 NPs can result in a significant increase in the near-infrared (NIR) absorption. Upon NIR irradiation at 808 nm, clustered Fe3O4 ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998